Outlier Detection with Two-Stage Area-Descent Method for Linear Regression
نویسندگان
چکیده
— Outlier detection is an important task in many applications; it can lead to the discovery of unexpected, useful or interesting objects in data analysis. Many outlier detection methods are available. However, they are limited by assumptions in distribution or rely on many patterns to detect one outlier. Often, a distribution is not known, or experimental results may not provide enough information about a set of data to be able to determine a certain distribution. Previous work in outlier detection based on area-descent focused on detecting outliers which are solely isolated; it can not detect the outliers clustered together. In this paper, we propose a new approach for outlier detection based on two-stage area-descent of convex-hull polygon. It not only detects outliers clustered together but also shows their location related to the data set. Instead of removing the outlier, this relative location provides a suitable direction for moving the outlier to reduce its effects to linear regression. In addition, this method does not depend on the distribution of data set. Key-Words: — Outlier detection, convex-hull, polygon, area-descent, linear regression.
منابع مشابه
Selection of Best Outlier Detection Method Using Regression Analysis
Outliers are unusual data values that are inconsistent with most of the records. Such non-representative records can seriously affect the model to be produced, so detecting outlier is a significant job to achieve higher accuracy. Several outlier detection methods are used in literature for real as well as simulated data sets. The aim of this study is to compare the two outlier detection method ...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملOutlier Detection by Boosting Regression Trees
A procedure for detecting outliers in regression problems is proposed. It is based on information provided by boosting regression trees. The key idea is to select the most frequently resampled observation along the boosting iterations and reiterate after removing it. The selection criterion is based on Tchebychev’s inequality applied to the maximum over the boosting iterations of ...
متن کاملOutlier Detection Methods in Multivariate Regression Models
Outlier detection statistics based on two models, the case-deletion model and the mean-shift model, are developed in the context of a multivariate linear regression model. These are generalizations of the univariate Cook’s distance and other diagnostic statistics. Approximate distributions of the proposed statistics are also obtained to get suitable cutoff points for significance tests. In addi...
متن کاملApplication of Recursive Least Squares to Efficient Blunder Detection in Linear Models
In many geodetic applications a large number of observations are being measured to estimate the unknown parameters. The unbiasedness property of the estimated parameters is only ensured if there is no bias (e.g. systematic effect) or falsifying observations, which are also known as outliers. One of the most important steps towards obtaining a coherent analysis for the parameter estimation is th...
متن کامل